Fundamentals of Missing Data in Evaluation

Presentation to MSU Department of Psychology, Program Evaluation Occasional Speaker Series, East Lansing, MI

Steven J. Pierce

Center for Statistical Training and Consulting

2024-12-05

Outline

  • What is missing data?
  • Why do we end up with missing data?
  • Why should we care about missing data?
  • How can we diagnose the missing data issues for a given study?
  • What should we do about missing data?

What is missing data?

Missing data (MD) are measurements you want or intended to collect but did not get.[1]

  • Having MD is common in research & evaluation studies.
  • If you do much evaluation work, you will run into MD.

Why do we end up with missing data?

Data collection doesn’t always go according to plan…

Human Factors Other Factors
Participant behavior Equipment failures
Evaluator errors Records/Databases
Partner behavior Unusual Events

Missing Data & Project Lifecycle

WhenMD Plan Study Planning & Design Collect Data Collection Plan->Collect Enter Data Entry Collect->Enter Manage Data Storage & Management Enter->Manage Analyze Data Analysis Manage->Analyze

Why should we care about missing data?

Ethics for Evaluators

Handling missing data well enacts our guiding principles[2]:

AEA logo.

  • Systematic inquiry
  • Competence
  • Integrity

Scientific Activities[3]

There are 3 major scientific activities that can be affected by missing data.

  • Making structured observations of constructs.
  • Using observations to draw inferences about relationships between constructs.
  • Generalizing the results to populations beyond the collected sample.

Consequences for Measurement[3]

  • Availability of constructs
  • Decreased reliability due to increased error variance
  • Bias from poor content coverage
  • Construct validity

Consequences for Internal Validity[1,3]

  • Selection bias
  • Compromised randomization
  • Power and precision
  • Inaccurate model assumptions

Consequences for Generalizability[1,3]

A representative sample is crucial to generalizing to the intended population!

  • Theory development & cumulative knowledge
  • Policy & decision-making

How can we diagnose the missing data issues for a given study?

Cattell’s Data Box[4]


How much data is there? Data volume is \(N_{values} = P \times V \times T\)

  • Slices through the cube represent subsets of data.
  • Constructs are often measured by groups of adjacent variables (items).
  • Missing data puts holes in your cube!

Types of Missingness[3,5]

  • Item level
  • Construct level
  • Person level (unit non-response)
  • Person-period level (wave nonresponse; intermittent vs. dropout)

Describing the Amount of MD[3,6]

Report numbers & percentages of:

  • Participants w/ any data at each time point (retention/attrition)
  • Complete vs. incomplete cases (overall & by time point)
  • Missing values for each variable & construct
  • Reasons for attrition/dropout and other missing data

Patterns of Missing Data[5]

  • Y: matrix of all the values that could be observed
  • Y_obs: subset of Y values that end up observed
  • Y_miss: subset of Y values that end up missing
  • R: response matrix of dummy-coded missingness indicators showing which Y values are observed (0, part of Y_obs) vs. missing (1, part of Y_miss)

Tip

We can aggregrate and visualize R to describe patterns of missingness!

Example Patterns of Missing Data

Missingness patterns for Dutch boys growth study data (748 boys, 9 variables, 1 time point)[7]

Rubin’s Mechanisms of Missingness[8]

  • Missing completely at random (MCAR)
  • Missing at random (MAR)
  • Missing not at random (MNAR)

Impact on Statistical Results[3]

Some mechanisms yield more bias: MCAR < MAR < MNAR

MCAR

MCAR is when neither observed nor unobserved values predict which values are missing.

MCAR Missing Missingness (R) Observed Observed Values (Y_obs) Random Random Processes Unrelated to Y Random->Missing Predict Unobserved Unobserved Values (Y_miss)

MAR

MAR is when observed values predict which values are missing.

MAR Missing Missingness (R) Observed Observed Values (Y_obs) Observed->Missing Predict Random Random Processes Unrelated to Y Random->Missing Predict Unobserved Unobserved Values (Y_miss)

MNAR

MNAR is when unobserved values predict which values are missing.

MNAR Missing Missingness (R) Observed Observed Values (Y_obs) Observed->Missing Predict Random Random Processes Unrelated to Y Random->Missing Predict Unobserved Unobserved Values (Y_miss) Unobserved->Missing Predict

Predictors of Attrition & Missingness in Longitudinal Studies

  • Study arm: Compare attrition rates
  • Study site (in multisite studies)
  • Baseline/pretest values of outcome variables may predict who drops out or has missing values
  • Other covariates (demographics)

What should we do about missing data?

Prevention

An ounce of prevention is better than a pound of cure

[9],[10]

Treatment

Reporting

Advice

  • Collaborate with a statistician!

Practical Options

  • Item-level missingness in scale scores[11,12]

References

1. Fernández-García, M. P., Vallejo-Seco, G., Livácic-Rojas, P., & Tuero-Herrero, E. (2018). The (ir)responsibility of (under)estimating missing data. Frontiers in Psychology, 9(556). https://doi.org/10.3389/fpsyg.2018.00556
2. American Evaluation Association. (2018). Guiding principles for evaluators [Web Page]. Author. https://www.eval.org/About/Guiding-Principles
3. McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. Guilford Press.
4. Cattell, R. B. (1966). The data box: Its ordering of total resources in terms of possible relations systems. In R. B. Cattell (Ed.), Handbook of multivariate experimental psychology (pp. 67–128). Rand McNally.
5. Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Bulletin, 7(2), 147–777. https://doi.org/10.1037//1082-989X.7.2.I47
6. van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman & Hall/CRC Press. https://doi.org/10.1201/9780429492259
7. Fredriks, A. M., Buuren, S. van, Burgmeijer, R. J. F., Meulmeester, J. F., Beuker, R. J., Brugman, E., Roede, M. J., Verloove-Vanhorick, S. P., & Wit, J.-M. (2000). Continuing positive secular growth change in the netherlands 1955-1997. Pediatric Research, 47, 316–323. https://doi.org/10.1203/00006450-200003000-00006
8. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.1093/biomet/63.3.581
9. Leeuw, E. D. de. (2001). Reducing missing data in surveys: An overview of methods. Quality & Quantity, 35(2), 147–160. https://doi.org/10.1023/A:1010395805406
10. Wisniewski, S. R., Leon, A. C., Otto, M. W., & Trivedi, M. H. (2006). Prevention of missing data in clinical research studies. Biological Psychiatry, 59, 997–1000. https://doi.org/10.1016/j.biopsych.2006.01.017
11. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576. https://doi.org/10.1146/annurev.psych.58.110405.085530
12. Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590